Intrinsic Geometry of Stochastic Gradient Descent Algorithms

نویسندگان

  • Robert E. Mahony
  • Krzysztof A. Krakowski
  • Robert C. Williamson
چکیده

We consider the intrinsic geometry of stochastic gradient descent (SG) algorithms. We show how to derive SG algorithms that fully respect an underlying geometry which can be induced by either prior knowledge in the form of a preferential structure or a generative model via the Fisher information metric. We show that using the geometrically motivated update and the “correct” loss function, the implicit and explicit discrete time updates are, under certain conditions, identical. This new loss function reduces to least square loss for linear regression with Gaussian measurement noise. We also show that the seemingly obvious requirement that the loss function is convex is not appropriate in non-flat geometries. We illustrate the power of the new framework by deriving an algorithm for a regression problem over a multinomial distribution.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Conjugate gradient neural network in prediction of clay behavior and parameters sensitivities

The use of artificial neural networks has increased in many areas of engineering. In particular, this method has been applied to many geotechnical engineering problems and demonstrated some degree of success. A review of the literature reveals that it has been used successfully in modeling soil behavior, site characterization, earth retaining structures, settlement of structures, slope stabilit...

متن کامل

Stability and Generalization of Learning Algorithms that Converge to Global Optima

We establish novel generalization bounds for learning algorithms that converge to global minima. We do so by deriving black-box stability results that only depend on the convergence of a learning algorithm and the geometry around the minimizers of the loss function. The results are shown for nonconvex loss functions satisfying the Polyak-Łojasiewicz (PL) and the quadratic growth (QG) conditions...

متن کامل

Fastest Rates for Stochastic Mirror Descent Methods

Relative smoothness a notion introduced in [6] and recently rediscovered in [3, 18] generalizes the standard notion of smoothness typically used in the analysis of gradient type methods. In this work we are taking ideas from well studied field of stochastic convex optimization and using them in order to obtain faster algorithms for minimizing relatively smooth functions. We propose and analyze ...

متن کامل

Decoupling the Data Geometry from the Parameter Geometry for Stochastic Gradients

Large-scale learning problems require algorithms that scale benignly with respect to the size of the dataset and the number of parameters to be trained; leading numerous practitioners to favor the classic stochastic gradient descent (SGD [1, 2, 3]) over more sophisticated methods. Besides its fast convergence, SGD has been observed to sometimes lead to signi cantly better generalization perform...

متن کامل

Adaptive On - Line Learning Algorithms for Blind Separation | Maximum Entropy and Minimum Mutual Information

There are two major approaches for blind separation: Maximum Entropy (ME) and Minimum Mutual Information (MMI). Both can be implemented by the stochastic gradient descent method for obtaining the de-mixing matrix. The MI is the contrast function for blind separation while the entropy is not. To justify the ME, the relation between ME and MMI is rstly elucidated by calculating the rst derivative...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005